# Visual reasoning
Llama 4 Scout 17B 16E Instruct FP8
Other
The Llama 4 series is a native multimodal AI model launched by Meta, supporting text and image interaction. It adopts the Mixture of Experts architecture and performs excellently in text and image understanding.
Multimodal Fusion
Transformers Supports Multiple Languages

L
fahadh4ilyas
1,760
0
Debiased Llama 4 Scout 17B 16E Instruct
Llama 4 Scout is a native multimodal AI model launched by Meta, supporting multilingual text and image understanding. It adopts the Mixture of Experts architecture and has industry-leading performance in text and image understanding.
Text-to-Image
Transformers Supports Multiple Languages

D
hirundo-io
1,716
0
Idefics2 8b Chatty
Apache-2.0
Idefics2 is an open multimodal model capable of accepting arbitrary sequences of images and text as input and generating text output. The model can answer questions about images, describe visual content, create stories based on multiple images, or function purely as a language model.
Image-to-Text
Transformers English

I
HuggingFaceM4
617
94
Llava Llama 2 13b Chat Lightning Preview
LLaVA is an open-source multimodal chatbot model based on the Transformer architecture, obtained by fine-tuning LLaMA/Vicuna on multimodal instruction-following data generated by GPT.
Text-to-Image
Transformers

L
liuhaotian
2,122
46
Featured Recommended AI Models